Statistics in Medicine
○ Wiley
Preprints posted in the last 30 days, ranked by how well they match Statistics in Medicine's content profile, based on 34 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Fayette, L.; Brendel, K.; Mentre, F.
Show abstract
Joint modelling of longitudinal data using non-linear mixed effects models and time-to-event outcomes provides a suitable framework to account for informative censoring when estimating biomarker dynamics and quantifying event risk using covariates and longitudinal trajectories. Their usefulness in clinical research depends on data collection design, particularly to precisely estimate the association (link) parameter between longitudinal and survival processes. However, optimal design strategies have so far been addressed separately for longitudinal and survival endpoints and remain unexplored for joint models. We propose two Fisher Information Matrix (FIM) computation methods for joint models, relying on Monte-Carlo integration over observations combined with either Markov Chains Monte-Carlo or Adaptive Gaussian Quadrature to integrate random effects. Their accuracy is assessed against clinical trial simulations in an oncological example based on the HORIZON III study with a tumour-growth-survival model including discrete and continuous covariates. We apply these methods to quantify the impact of follow-up duration, sampling richness, sample size, and covariate distribution on parameter uncertainty and test power. In our example, longitudinal-parameter uncertainty is barely affected by follow-up duration or sampling richness, whereas survival-parameter uncertainty decreases substantially from 1-year to 2-year follow-up. The number of subjects needed (NSN) to achieve <15\% uncertainty on the link parameter is comparable for a 2-year rich design and a 3-year sparse design. Optimal covariate distributions are stable across designs and systematically improve test power, outperforming longer and richer but non-optimised designs. These FIM-based methods accurately predict uncertainty and test powers, enabling design evaluation and NSN computation for joint-model-based clinical studies.
Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.
Show abstract
Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.
Obeng-Gyasi, E.
Show abstract
Background: Mixture epidemiology deploys sophisticated estimators, Bayesian kernel machine regression with causal mediation analysis (BKMR-CMA), quantile G-computation (QGC), and parametric G-computation, alongside conventional regression. Comparative evaluations have assumed additive, non-mediated data-generating processes, leaving conditions under which estimator choice determines causal validity uncharacterized. Methods: We developed a simulation framework using military-relevant exposure distributions (metals, per- and polyfluoroalkyl substances [PFAS], polychlorinated biphenyls [PCBs]) and allostatic load (AL) across three deployment tiers, with parameters drawn from military occupational health and contamination literature. Four data-generating processes were specified as directed acyclic graphs: direct effects with confounding (M1), full mediation through AL (M2), synergistic AL-exposure interaction (M3), and collider structure (M4). We evaluated ordinary least squares (OLS), QGC, G-computation, and BKMR-CMA on bias, root mean squared error, and 95% confidence interval coverage across 500 Monte Carlo replications at n = 500 and n = 1,000. Results: No estimator dominated across all mechanisms. Under M1, OLS and G-computation produced near-identical modest positive bias; BKMR-CMA achieved lower root mean squared error through kernel shrinkage. Under M2, BKMR-CMA exhibited severe positive bias for AL (mean bias = +0.579 SD units; coverage = 32.8%). Under M3, BKMR-CMA was the only estimator achieving nominal 95% coverage for AL (95.2%), while regression-based approaches fell to 83.6%. Under M4, G-computation produced persistent bias and near-zero coverage for lead, reflecting structural non-identification. Conclusions: Estimator validity is fundamentally mechanism-dependent. Researchers should base estimator choice on explicit causal assumptions about whether AL functions as confounder, mediator, moderator, or collider, particularly in military and occupational cohorts. We provide a mechanism-to-estimator mapping for applied researchers.
Kleper, S. L.; Melamed, R. D.
Show abstract
Machine learning models for causal inference aim to adjust for confounding factors that are associated with both an exposure and an outcome, creating a spurious biased association. But, these methods are rarely empirically evaluated to assess their success in mitigating such bias. Recent advances in knowledge representation, including both foundation models and knowledge graphs, could enrich these models, but rigorous evaluations are needed in order to assess their potential. Here, we ask whether enriching existing causal inference models with knowledge representations from foundation models can improve confounding control. Rather than using semi-simulated data to address this question, we focus on examples of real confounding: we emulate target randomized active comparator trials that are subject to confounding by indication. Our results can guide researchers aiming to develop or apply methods for discovering causal effects from observational data.
Shukla, N.; Bartington, S. E.; Hansell, A. L.; Lucas, T. C.
Show abstract
Background: In the absence of high-resolution response data, exposure-response modelling often relies on aggregated low-frequency exposure data, leading to loss of high-resolution information. Mixed Data Sampling (MIDAS) from econometrics offers an alternative but is limited due to its inability to make high-resolution predictions, inflexible likelihoods and penalised nonlinear functions, and limited visualization options. We propose a mixed-frequency Distributed Lag Non-linear Model (mf-DLNM) which can eliminate the need to aggregate exposure data in environmental epidemiology and provide high resolution predictions for time series studies. Methods: We evaluated the inference and predictive performance of the mf-DLNM. To evaluate its ability to estimate exposure-response relationships, we applied mf-DLNM and same-frequency (sf)-DLNM using data from the West Midlands, UK. Additionally, we compared the predictive performance of mf-DLNM with sf-DLNM and MIDAS across nine regions of England. As MIDAS cannot predict at the resolution of the predictor (daily), we compared the predictive performance of mf-DLNM and MIDAS at weekly resolution. To test the model's ability to predict high temporal resolution risk (daily), we compared sf-DLNM (with access to daily mortality counts) with mf-DLNM (with access only to weekly mortality counts). Results: In the West Midlands example, mf-DLNM performed comparably to sf-DLNM in estimating daily risk of temperature on respiratory mortality. Furthermore, mf-DLNM and MIDAS exhibited similar performance for weekly predictions. For high-resolution predictions, mf-DLNM and sf-DLNM showed nearly similar performance, despite mf-DLNM having access only to low-resolution response data. Conclusion: This mixed-frequency approach in environmental epidemiology overcomes the limitations of predicting health risks using aggregated exposure data and provides estimates of high-resolution outcomes in the absence of high-frequency health outcome datasets.
Lan, Y.; Wu, C.-Y.; Lin, H.-H.; Cohen, T.; Warren, J. L.
Show abstract
Pairwise analysis of genomic and spatial data offers opportunities to identify and estimate the associations between covariates and the transmission of pathogens between individuals. However, such pairwise analyses are computationally intensive, and may not be feasible to conduct given the high dyad count in even moderately sized datasets. Here we compare two approaches to increase the efficiency of pairwise analysis for large datasets. We quantify and compare the performance of divide-and-conquer Bayesian model fitting and pairwise case-control approaches for estimating associations between individual- and pair-level covariates and shared membership in a transmission cluster. We utilize a large dataset (n=4,154) of spatially-referenced, genomically-sequenced Mycobacterium tuberculosis isolates collected from a single city for this analysis. We find that the case-control approach produces unbiased estimates of effect sizes with expected credible interval coverage and is more robust than the divide-and-conquer method when effect sizes are large. Thus, we recommend using the case-control approach with at least three controls per case to downscale datasets for pairwise analysis when analysis of the entire dataset is not possible. This approach mitigates the computational challenges of pairwise Bayesian modeling on datasets that require significant computational resources while maintaining desired inferential properties. Author SummaryPairwise analyses of large datasets to study pathogen transmission are computationally demanding because they typically require simultaneous analysis of each possible pair of individuals in a dataset; as datasets become larger these analyses often are not feasible to conduct even with access to high-performance computing resources. In this work, we compare a case-control approach and divide-and-conquer approaches for more efficient pairwise analysis of large datasets. Using a large dataset of Mycobacterium tuberculosis isolates including genetic and spatial data, we investigate the performance of each method for estimating the associations between host covariates and genetic clustering of isolates. We find that the case-control approach is generally preferred over methods which first divide the data into subsets and then combine results. While additional extensions of these analyses are needed to test the generality of these findings to other data settings, this work provides a practical way forward for the pairwise analysis of large datasets to study pathogen transmission.
Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.
Show abstract
Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.
Kelly, R. E.
Show abstract
Null Hypothesis Significance Testing (NHST) remains the dominant paradigm for evaluation of empirical research findings in medicine and the social sciences despite concerns about frequent misinterpretations of those findings. Achievement of "statistical significance," the goal of NHST, often beckons unrealistic conclusions. Helpful would be the addition of a broader, Bayesian perspective of research in terms of progressive readjustment of hypothesis credibility from all sources of evidence. For this purpose, the Hypothesis Race Model (HRM) provides an intuitive Bayesian approach that builds upon NHST-concepts, helping to correct misunderstandings with minimal reeducation. The HRM is an extension of the Bayesian approach by Ioannidis in 2005 that helped to explain "why most published research findings are false." It is powerful enough to serve as the foundation for mathematical models to estimate and reduce the cost of empirical hypothesis testing.
Li, Q.; Chu, W.; Shahriyari, L.
Show abstract
This paper presents a unified six-state Continuous-Time Markov Chain (CTMC) framework for Chronic Kidney Disease (CKD) progression, with CKD stages 1-5 modeled as transient states and death as an absorbing state. Under a non-homogeneous CTMC formulation, we derive integral representations for transition probabilities, state distributions, sojourn times, and survival-related quantities. We then study the homogeneous case as a tractable baseline and provide explicit formulas for key quantities. Although the methodology is rooted in standard multi-state theory, these expressions are often left implicit in applied analyses; here they are written out explicitly within a unified CKD framework. We construct covariate-dependent transition rates through a proportional hazards structure, using the standard identification of cause-specific hazards with CTMC transition rates. We fit the time-homogeneous baseline model to 335,283 longitudinal observations from 21,100 synthetic electronic health record patients by maximum likelihood. In this synthetic cohort, the covariate model improves held-out log-likelihood per transition over the null model, with stable performance across 10-times-repeated 5-fold cross-validation, and reproduces the main population-level prevalence patterns. The transition-specific estimates can also be translated into sojourn-time and survival summaries. The model suggests that male sex is associated with faster progression across nearly all CKD transitions, and that hypertension shows a stage-dependent association, with lower estimated transition rates in early stages but a substantial acceleration of the Stage 4 to Stage 5 transition. Overall, the proposed framework provides a mathematically explicit approach for studying CKD trajectories from longitudinal health records.
Chen, P.; Bauer, R. J.; Li, Y.
Show abstract
Population pharmacokinetic (popPK) models are commonly developed using ordinary differential equations (ODEs) to describe deterministic concentration-time profiles, with unexplained variability typically attributed to interindividual variability or residual error. When model misspecification is present, system-level deviations may be absorbed into these conventional variability terms, making the source and magnitude of model inadequacy difficult to assess quantitatively. Stochastic differential equations (SDEs) provide an alternative framework by introducing an explicit system-noise component into the structural model, allowing model-data mismatch to be evaluated more directly. However, historical implementation of SDE-based models in NONMEM has been technically challenging. The availability of the Fortran plug-in subroutine SDE.f90 substantially lowers this barrier and enables more practical implementation of SDE-based models in NONMEM. In this work, SDE-based nonlinear mixed-effects models were evaluated as a quantitative diagnostic framework for probing popPK model misspecification. The SDE.f90 implementation was first verified using simulated one-compartment intravenous bolus datasets with stochastic process noise. Additional simulation-estimation scenarios were then conducted under intentionally misspecified structural or stochastic assumptions, including time-varying elimination, compartmental misspecification, and residual error misspecification. Across these scenarios, the estimated system-noise parameter was generally sensitive to misspecification, with larger values usually associated with greater structural or stochastic mismatch. SDE-based modeling also helped partially separate system-level variability from residual variability and, in selected settings, supported localization of misspecification to specific model components, thereby helping guide model refinement. Overall, SDE-based popPK modeling is a useful addition to the pharmacometric diagnostic toolbox, with system-noise estimates best interpreted together with structural model evaluation, residual diagnostics, parameter behavior, and pharmacologic plausibility.
Hagan, J.
Show abstract
Background. Cross-validation (CV) is widely used to estimate predictive performance, but can overestimate performance when applied at the observation level to repeated-measures data. When continuous predictor variables are measured repeatedly within subjects and the binary outcome is defined at the subject level, naive observation-level CV introduces data leakage through within-subject dependence, producing optimistically biased estimates of the area under the receiver operating characteristic curve (AUROC). The magnitude of this bias and the performance of alternative partitioning strategies have not been formally characterized for this data structure. Methods. Three CV strategies were compared for estimating subject-level AUROC in ridge logistic regression models: naive observation-level 10-fold CV, subject-level 10-fold CV, and leave-one-cluster-out (LOCO) CV. The framework was applied to a motivating clinical dataset of daily oxygenation measures and retinopathy of prematurity outcomes among 101 extremely low birth weight infants. A factorial simulation study was conducted across 162 parameter combinations varying cluster count (20-150), intraclass correlation (0.1-0.5), within-cluster autocorrelation (0.2-0.8), and outcome prevalence (10-35%), with 500 simulated datasets per condition (76,389 valid datasets total). Results. In the motivating dataset, naive CV produced optimism of +0.078 AUROC units for severe ROP prediction (15 events, 101 subjects) and +0.031 for any ROP prediction (48 events). Subject-level 10-fold CV closely approximated LOCO (deviation [≤] 0.015). In the simulation, naive CV optimism ranged from +0.039 to +0.204 across all conditions, increasing monotonically with higher ICC, higher autocorrelation, fewer clusters, and lower event rates. Subject-level 10-fold CV was essentially unbiased relative to LOCO across all 162 conditions (mean absolute deviation = 0.002). Conclusions. Naive observation-level CV meaningfully overestimates discriminative performance in the repeated-measures binary outcome setting and should not be used. Subject-level CV partitioning effectively eliminates this bias. Accordingly, subject-level partitioning should be considered essential, not optional, when validating prediction models using repeated-measures data with subject-level outcomes.
Islam, N.; Luo, C.; Tong, J.; Weller, G.; Polleya, D. A.; Kent, A.; Bair, S.
Show abstract
Introduction In analyses of time-to-event data, clinical characteristics can have non-linear impacts on survival outcomes, and understanding this dynamic behavior is crucial for producing real-world evidence (RWE). Nonetheless, estimating these dynamic effects is inherently challenging when utilizing real-world data (RWD), especially since sharing individual-level patient data (IPD) is heavily restricted due to regulatory limitations. Additionally, computational difficulties are exacerbated by the high dimensionality, inter-dependency, rarity, sparsity, and scarcity of features. While data augmentation through collaboration across multiple sites might address these challenges, such collaboration is often infeasible and hindered by regulatory measures that protect patient privacy, thereby preventing the sharing of IPD between sites. Objectives To address this challenge, we propose a privacy-preserving regularized algorithm that eliminates the necessity of aggregating any protected health information across sites. This algorithm employs a penalized federated additive model utilizing piecewise exponential survival (FAMES) data and estimates non-linear effects of features while accounting for non-varying confounding effects. The model is flexible and can accommodate both multiple and multivariate smooth effects simultaneously. Methods The proposed model transforms survival data into a piecewise exponential data (PED) structure and casts the semi-parametric optimization problem into a generalized additive modeling framework assuming Poisson distribution. The model uses orthonormal splines to approximate non-linear effects and incorporates L2-norm based penalty terms to control the smoothness and goodness-of-fit of these effects. The algorithm is optimized using site-specific aggregated summary statistics and is solved iteratively through the Newton-Raphson method. Results The model is employed to assess the smooth effects of clinical features, such as age and numeric laboratory values, on overall survival using RWD from approximately 874 newly diagnosed Acute Myeloid Leukemia (AML) patients treated at seven distinct sites in the United States. The model exhibited non-linear smooth effects for lactate dehydrogenase, platelets, and others underscoring their strong association with disease prognosis. The model demonstrates a lossless property, providing estimates of smooth and fixed effects that are comparable to those derived from the pooled PED. Additionally, the inference of parameters for testing the nullity of effects remains consistent. This model is communication-efficient, necessitating roughly twelve rounds of communication across sites. Conclusion We anticipate that this model can facilitate multisite collaboration and enable smaller sites to participate in generating and validating RWE, especially for rare diseases. While the model was applied within the context of AML, it is disease-agnostic and can be implemented in any other clinical context and across various sites globally without losing any generality.
Ockenden, E. S.; Anguajibi, V.; Mpooya, S.; Ntegeka, B.; Mugume, T.; Nabatte, B.; Kabatereine, N. B.; Noble, A.; Chami, G. F.
Show abstract
Liver fibrosis is a major cause of death in low- and middle-income country contexts. In rural, poor areas of sub-Saharan Africa, schistosomiasis is an underestimated cause of liver fibrosis. Despite the need for increased diagnostic capacity for schistosomiasis-related liver fibrosis, there are no automated, clinically-validated tools to diagnose schistosomiasis-related liver fibrosis. We present SchistoTrackNet which is, to our knowledge, the first deep learning-based model for distinguishing distinct presentations of schistosomiasis-related liver fibrosis of varying severity. Ultrasound images from 1533 participants aged 5--84 years from three districts in rural Uganda were used to train and evaluate the presented models. The models were evaluated by assessing failure cases and by comparing results with re-readings performed by sonographers experienced in diagnosis of schistosomiasis morbidity. Our models show potential to enable automated reading of ultrasound images for schistosomiasis-related liver fibrosis to allow large-scale surveillance of schistosomiasis morbidity and contribute towards the World Health Organization target to eliminate schistosomiasis as a public health problem.
Alrefae, T. A.; Pons-Salort, M.; Donnelly, C. A.; Lambert, B.; Kamau, E.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWSerological assays remain the standard experimental approach for estimating the cumulative incidence of a pathogen and monitoring population immunity. The predominant approach for analysing serum titration data from virus neutralisation assays uses a nearly century-old interpolation-based method which neglects inherent imperfections in the assay and produces estimates with no measure of uncertainty. We introduce a two-part Bayesian modelling framework to estimate the underlying antibody concentrations in the raw serum samples taken from serosurveyed individuals, to improve the interpretation of serological data over age. First, we develop a mechanistic Bayesian model for serum antibody titration data that estimates latent antibody concentrations while accounting for assay variability and quantifying uncertainty. Second, we propagate this uncertainty into an age-structured serocatalytic model by integrating over posterior draws of individual antibody concentrations, allowing joint inference on latent serostate membership, force of infection, and serological waning rate. We use this framework to explore the dynamics of infection and immunity for three enterovirus serotypes: enteroviruses A71 (EV-A71) and D68 (EV-D68) and coxsackievirus A6 (CVA6). These serotypes are leading causes of outbreaks of severe respiratory illness and hand, foot, and mouth disease. Applying these approaches to three cross-sectional serosurveys, we estimated consistently higher and more persistent antibody concentrations throughout life for EV-D68 compared to EV-A71 and CVA6. Our analysis suggests that the proportion of recently infected individuals (i.e. individuals with high estimated antibody concentration levels given their age) peaks around 25% by age 7 years for both EV-A71 and CVA6 before gradually declining with age. In contrast, for EV-D68 the inferred proportion of the population in the infected state exceeds 50% by age 9 years and continues to grow with age. We also estimate that EV-D68 antibody concentration levels are higher than those of the other two serotypes, with the force of infection estimated to be highest in early childhood and declining more gradually with age than for EV-A71 and CVA6. These estimates are different to previous estimates found in the literature. Our inferential framework uncovers the wide-ranging variation in antibody levels that are often obscured by conventional endpoint titre estimation methods. We demonstrate that our framework can infer infection rates without relying on predetermined seropositivity cut-offs and without making explicit assumptions of virus-specific infection mechanisms. Author summarySerological tests measure antibody levels in blood to show how widely a virus has spread and how well populations are protected. Titre-based tests dilute blood samples in steps, mix these dilutions with virus, and add the mixture to living cells; the titre is the highest dilution where antibodies still protect cells from infection. Traditional analyses overlook test imperfections. We present a new two-part Bayesian framework to estimate antibody levels and track age-related exposure to infection. First, we estimate underlying antibody concentrations while accounting for uncertainty, then use these estimates in another model to infer age-specific transmission of three common viruses - EV-A71, EV-D68, and CVA6. Our results show that EV-D68 infections may be more common, especially in children, compared to the other viruses. This new approach provides a clearer picture of the dynamics of seroconversion, without relying on arbitrary thresholds, helping to improve public health monitoring and responses.
Morgenstern, C.; Khurana, M. P.; Naidoo, T.; Rawson, T.; Cori, A.; Duchene, D. A.; Ferguson, N. M.; Kraemer, M. U. G.; Bhatt, S.
Show abstract
The incubation period, the interval between pathogen exposure and symptom onset, is a critical epidemiological parameter for follow-up policy and outbreak response, yet individual-level exposure data remain scarce, especially early in outbreaks. For most priority pathogens, only summary statistics are available because sharing of individual-level data can be sensitive. Here we introduce a Bayesian hierarchical framework that jointly models individual-level observations and published summary statistics under a unified federated analysis framework. Simulation studies demonstrate that the method accurately recovers incubation period distributions across a range of data availability scenarios, generally outperforming approaches that use published summary statistics alone. Applying the framework to 18 pathogens, including 10 priority pathogens classified to have outbreak potential by the World Health Organization, we find substantial between-study heterogeneity in incubation period estimates, including by outbreak country for SARS-CoV-1, variants of concern for COVID-19, and exposure setting for typhoid fever. These estimates, together with the curated dataset and modelling framework in our associated R package ddsynth, provide a reproducible foundation for improved incubation period estimation and synthesis across pathogens of epidemic concern. Our framework enables robust and rapid estimation of incubation periods during new outbreaks.
Irlmeier, R.; Jin, Z.; Ye, F.
Show abstract
Background Simon two-stage designs for binary endpoints and their time-to-event analogues, including the Kwak and Jung method, rely on a fixed null benchmark. Their Type I error control is valid only when that benchmark is correctly specified. In practice, historical benchmarks are often inconsistent due to small samples, population heterogeneity, changing eligibility criteria, and evolving standards of care. Even modest misspecifications can substantially inflate the Type I error rate, leading to costly advancement of ineffective treatments. Methods We propose the Interval-Null Robust (INR) two-stage design framework that accounts for uncertainty in the historical null benchmark. We define the null hypothesis as a plausible range of clinically uninteresting values: p[isin][p0L, p0U] for binary endpoints and {lambda}[isin][{lambda}0L, {lambda}0U] (or equivalent survival probabilities) for time-to-event endpoints. Type I error is controlled uniformly over the full null interval: sup{theta}[isin]{theta}0 Pr{theta}(Go) [≤] . Under the monotonicity of the Go probability, the supremum occurs at the least favorable null configuration - p0U and {lambda}0L - but the design is not reduced to a point-null formulation. The interval defines the uncertainty set for error control and is used in selecting among feasible designs through robust criteria such as worst-case regret or minimal average expected sample size. Results Across representative planning scenarios for both endpoint types, classic designs calibrated to a single benchmark exhibit substantial Type I error inflation when the true null parameter exceeds the assumed planning value. INR designs maintain the nominal Type I error rate across the full null interval, directly addressing this vulnerability to benchmark misspecification. The robustness-efficiency trade-off can be managed through design constraints and robust optimization criteria while preserving uniform Type I error control. Conclusions INR two-stage designs offer a transparent framework for addressing historical control uncertainty in single-arm Phase II trials. By replacing reliance on a fixed benchmark assumption with a more realistic interval of clinically plausible null values, INR design reduces the risk of false-positive Go-decisions caused by benchmark misspecification. INR applies to both binary and time-to-event endpoints and is implemented in the open-source INRDesign R package and accompanying interactive Shiny app.
Renner, P.; Polemiti, E.; Jentsch, M.; Banks, J. R.; Cleff, D.; Siehl, S.; Dallavalle, M.; Lett, T.; Buck, C.; Castell, S.; Frost, J.; Grabe, H.; Keil, T.; Harth, V.; Kettlitz, R.; Krist, L.; Leitzmann, M.; Mikolajczyk, R.; Naaouf, N.; Obi, N.; Peters, A.; Schneider, A.; Wolf, K.; Nees, F.; Twardziok, S. O.; Marquand, A.; Hese, S.; Schepanski, K.; Schumann, G.; environMENTAL consortium,
Show abstract
Environmental exposures are increasingly examined in relation to mental health, yet large-scale epidemiological analyses remain constrained by fragmented geospatial data, heterogeneous spatial and temporal resolutions, and privacy-preserving linkage requirements, limiting systematic investigation of multiple environmental domains at the population level. We present environMAP, a harmonised set of analysis-ready environmental exposure layers derived from open, global sources. environMAP spans the built environment, green and blue spaces, light exposure (solar radiation and night-time light), terrain, weather and extremes, and air pollution. We document data provenance, spatial buffers, preprocessing, projection alignment, and metadata, and provide a reproducible workflow for privacy-preserving linkage to cohort residential locations. To demonstrate utility, we linked environMAP to >200,000 adults in the German National Cohort (NAKO) and summarised self-reported lifetime doctor-diagnosed depression across exposure gradients using sex-stratified descriptive analyses. Gradients were interpretable and broadly consistent with prior evidence, supporting feasibility, scalability, and hypothesis generation. The framework is adaptable to other outcomes, cohorts, and regions.
Kleinbloesem, C. H.; Braal, C. L.
Show abstract
Background Classical pharmacokinetic-pharmacodynamic (PK/PD) theory models exposure-effect in two dimensions: magnitude and time. Rate-dependent toxicity has been documented across therapeutic domains but never formalised as a conserved biological constraint. Methods We developed the Human Adaptive Rate Limit (HARL) framework, formalising the maximum tolerable velocity as |dS/dt|_max = sigma_max / tau. We validated HARL across five domains using published trial data and a reanalysis of the longitudinal biomarker data from the 202-patient CAR-T cohort of Wei et al (2023). An 8-ODE quantitative systems pharmacology model guided biomarker selection. Early biomarker velocities (maximum positive slope within days 0-5) were computed for ferritin and D-dimer. Patients were classified as high-risk only if both velocities exceeded their thresholds (dual-velocity classifier). Thresholds were identified by grid-search optimisation of the Youden index and assessed by leave-one-out cross-validation. Findings A prospective crossover study (Kleinbloesem 1987, n=8) demonstrated that matched steady-state nifedipine concentrations produce divergent haemodynamic responses depending solely on rate of rise, anticipating the dose-related mortality signal subsequently reported across ~8350 patients with coronary heart disease (Furberg 1995), a meta-analysis that was itself debated. Convergent evidence spans haematology (CHOIR, 1432 patients, hazard ratio [HR] 1.34 [1.03-1.74] for aggressive Hb correction), radiation (dose-rate effectiveness factor [DDREF] 1.5-2.0), and infusion pharmacology. In the CAR-T cohort, high-risk classification (ferritin >232 ng/mL per day AND D-dimer >1.21 mg/L per day) predicted severe CRS with 100% sensitivity (~78% specificity) in safety rule-out mode and 91.1% sensitivity (93.6% specificity, AUC 0.95 [95% CI 0.91-0.98]) in Youden-optimised mode. Median kinetic lead time was 4 days (range 3-7) before clinical decompensation. Interpretation Biological tolerability is three-dimensional. HARL unifies rate-dependent toxicity across domains spanning minutes to weeks. MTDyn--specifying target level and allowable rate of change--should supplement conventional dose-response assessment.
Liang, M.; Wu, R.; Xiao, F.; Li, X.
Show abstract
Mendelian randomization (MR) is widely used to draw causal conclusions in the presence of unmeasured confounding, but most MR analyses focus on average treatment effects and rely on strong assumptions. For precision medicine, the primary target is instead the individualized treatment effect (ITE); yet in MR, such effects are not point-identified under core IV assumptions, and valid inference is particularly challenging. We therefore propose a robust partial identification inference framework for ITE under MR allowing multiple instruments. Under minimal causal assumptions, we derive a sharp inference procedure for the intersection bounds of ITE by adopting a multiplier bootstrap procedure with data-adaptive bootstrap distribution shifting and heterogeneous variance adjustment. In theory, we prove that the proposed method achieves nominal coverage and asymptotic sharpness. Further, we extend the procedure to tolerate possible invalid IVs under a minimal proportion rule assumption by aggregating over instrument subsets while preserving coverage. Simulation studies demonstrate that the proposed methods attain nominal coverage and substantially shorter intervals than existing procedures. We illustrate the framework using data from the Alzheimers Disease Neuroimaging Initiative to assess heterogeneous causal effects of TREM2 expression on Alzheimers disease risk across education-defined subgroups.
Long, H.; Gada, L.; Murray, L.; Laurence, T.; Hayward, A.; Finnie, T.
Show abstract
Sex work is diverse and includes a broad range of people and settings. Over the last thirty years, a large proportion of public health emergencies of international concern (PHEIC) have involved infections transmitted through sexual or close contact and in sexual networks (WHO 2024). Sex workers can face increased disadvantage in relation to these public health emergencies. Given the significant health inequalities sex workers can face, they should be eligible to receive targeted and tailored health support to reduce health protection risks (Hester 2019; Jeal and Salisbury 2004a). However, they are often not explicitly eligible for targeted and tailored support due to a lack of information on incidence, prevalence of disease, and even more basic data such as reliable estimates of the number of sex workers in the UK. Accordingly, the aim of this paper is to determine a population size estimate, with uncertainty, that is more robust than those currently available. In this study, we apply Bayesian Evidence Synthesis to bring together historic estimation efforts with recent ONS National Population Estimates and Genito-Urinary Medicine Clinics Attendance Data (GUMCAD) from the UK Health Security Agency (UKHSA). A key feature of our model is the embedding of uncertainty from each input study in model priors, hence propagating it through to our final estimate. The Bayesian evidence synthesis model estimated a total of 84,000 sex workers in the United Kingdom (95% credible interval: 49,000-130,000), representing 0.121% of the current UK population.